Video analytics encoding for improved efficiency of video processing and compression

ABSTRACT

Embodiments are generally directed to video analytics encoding for improved efficiency of video processing and compression. An embodiment of an apparatus includes a memory to store data, including data for video streaming, and a video processing mechanism, wherein the video processing mechanism is to analyze video data and generate video analytics, generate metadata representing the video analytics and insert the generated video analytics metadata into a message, and transmit the video data and the metadata to a succeeding apparatus or system in a video analytics pipeline, the video data being compressed video data.

CLAIM TO PRIOR

This Application is a continuation of and claims the benefit of andpriority to U.S. application Ser. No. 16/235,574, entitled VIDEOANALYTICS ENCODING FOR IMPROVED EFFICIENCY OF VIDEO PROCESSING ANDCOMPRESSION, by Palanivel Guruva Reddiar, et al., filed Dec. 28, 2018,now allowed, which is related to and claims priority to U.S. ProvisionalPatent Application 62/651,505, filed Apr. 2, 2018, the entire contentsof which are incorporated herein by reference.

TECHNICAL FIELD

Embodiments described herein generally relate to the field of dataprocessing and, more particularly, video analytics encoding for improvedefficiency of video processing and compression.

BACKGROUND

Video streaming has increased in popularity and and usage in a widevariety of applications. Many of video usages includes the applicationof video analytics to, for example, identify potential elements forprocessing, such as the identification of potential presence of people,vehicles, dangerous conditions, and other various elements that may bepresented in a scene.

An apparatus or system may include a processing pipeline that is toprovide analytical operation in multiple stages, and to handle thetransport of the video data and analytics. In a conventional operation,at each stage of the processing pipeline, the result of the videoanalytics is represented by graphics that are overlaid on video frames.For example, the video analytics may be presented by placing a visiblerectangular bounding box within a video frame, compressing the resultantvideo frame, and sending the compressed to the next stage element. Suchconventional techniques require video data to be decoded and re-encodedrepeatedly for generating compressed bit streams.

This re-encoding of videos is computationally expensive, while encoderlatency is considerably high and has a direct impact on the overallsystem latency. Further, video quality decreases due to multiplere-encoding stages in the processing pipeline, where any overlaidgraphics representing the metadata may not be removed from the videoframe. Moreover, any mistaken detections and/or classifications may notbe easily corrected by further elements in the processing pipeline.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments described here are illustrated by way of example, and not byway of limitation, in the figures of the accompanying drawings in whichlike reference numerals refer to similar elements.

FIGS. 1A and 1B illustrate end-to-end processing of video data accordingto some embodiments;

FIG. 2 is an illustration of a process for processing of video data byan apparatus or system in an analytics pipeline according to someembodiments;

FIG. 3 is an illustration of video frames in a video stream according tosome embodiments;

FIG. 4 is an illustration of video analytics operations according tosome embodiments;

FIG. 5 is an illustration of pictures for which video analytics metadatahas been generated according to some embodiments;

FIG. 6 illustrates an apparatus or system to provide video analyticsencoding according to some embodiments;

FIG. 7 illustrates an apparatus or system including video analyticsencoding to provide improved efficiency of video processing andcompression according to some embodiments; and

FIG. 8 illustrates a computing device according to some embodiments.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forth.However, embodiments, as described herein, may be practiced withoutthese specific details. In other instances, well-known circuits,structures and techniques have not been shown in detail in order not toobscure the understanding of this description.

In some embodiments, an apparatus, system, or process provides for videoanalytics encoding in an analytics pipeline to provide improvedefficiency in video processing and compression.

FIGS. 1A and 1B illustrate end-to-end processing of video data accordingto some embodiments. An end-to-end intelligent video solution involvesprocessing of video data at various stages in an analytics pipeline,each stage being an apparatus or system. In an example as illustrated inFIG. 1, video data is captured and initially analyzed by a camera in anedge stage 110, which may include a smart camera or IP (Internetprotocol) camera (referred to in general as an edge camera herein). Theanalytics pipeline 100 may further have stages including a gateway 120,such as a video streaming box, video scoring server, and a network videorecorder; edge cloud 130, such as a video DL (Deep Learning) trainingserver, video scoring server, and video storage streaming server; andcloud servers 140, shown as a video application server. Also illustratedare the interfaces between the stages, such as a first interface 115between the edge 110 and the gateway 120, a second interface 125 betweenthe gateway 120 and the edge cloud 130, and a third interface betweenthe edge cloud 130 and the cloud servers. It is noted that FIG. 1Aillustrates a particular implementation of an analytics pipeline, and apipeline is not limited to the particular elements shown in FIG. 1.

In the edge stage 110, the edge camera may provide for data capture andinitial data processing. In conventional operation, at each of themultiple stages in the analytics pipeline 100, the video data isde-compressed, analyzed using computer vision/machine learning/deeplearning methods, and then again compressed and sent to next stageelement. For example, as depicted in FIG. 1, the edge camera at the edgestage 110 carries out certain minimal analytics, such as a subjectidentification indicated by a bounding box, with the result of theanalytics being overlaid onto the frame and the resultant video beingcompressed and sent on via interface 115 to the gateway 120. The gatewaythen is operable to carry out advanced analytics, with the resultantvideo frames being compressed and sent to the edge cloud 130 for furtheranalysis. and continuing through the analytics pipeline 100.

In conventional techniques, at every stage, graphics representing themetadata is overlaid on the video frame. For example, a visible boundingrectangle generated by a camera in the edge stage 110 is inserted in thevideo frame 145, and the resultant video frame is compressed and sent tothe next stage element, as shown in FIG. 1A. This conventional processcreates a requirement for decompressing video stream data, analyzing thedecompressed data, adding additional analytics to the video frame, andre-compressing the video stream data for transmission to the next stage.

In some embodiments, as illustrated in FIG. 1B the processing andtransfer of video data in an analytics pipeline is modified tostreamline the processing of video stream data in an analytics pipeline.As shown in FIG. 1B, a video analytics pipeline may include stages suchas an edge 160, a gateway 170, edge cloud 180, and cloud servers 190,together with interfaces between the stages, such as a first interface165 between the edge 160 and the gateway 170, a second interface 175between the gateway 170 and the edge cloud 180, and a third interfacebetween the edge cloud 180 and the cloud server 190. In someembodiments, an apparatus, system, or process includes the creation ofmetadata to encode video data analytics, and to transfer the metadatatogether with the compressed video stream data, wherein the video streammay be encoded as, for example, coded slices in the VCL (Video CodingLayer), to the next stage in the analytics pipeline. In this manner, thegenerated metadata is made available to each stage by encoding theanalytics data as metadata that can accessed separately from the videostreaming data. In a particular embodiment, the metadata is encoded inthe form of a Supplemental Enhancement Information (SEI) message, whichis carried in the Network Abstraction Layer (NAL) of the H.264 and H.265(HEVC—High Efficiency Video Coding) standards. A new SEI message may beoffered and referred to as the Object Tracking SEI to encode the resultof the video analytics, as depicted in FIG. 3.

In some embodiments, at stages in an analytics pipeline other than theedge 160, such as the gateway 170 and edge cloud servers 180, thecompressed video may be decoded and further analyzed, with the resultsof the video analytics being added in the bitstream as an objecttracking SEI message. In some embodiments, if a pre-existing objecttracking SEI message already exists in the bitstream from an earlierstage in the pipeline, then the existing SEI message contents may bemodified based on the analytics performed at the later state stage.

In some embodiments, one or more apparatuses or systems in a videoanalytics pipeline 150 provides one or more of computer vision(referring in general to a computing system obtaining high-levelunderstanding from digital images or video), machine learning (referringin general to a computing system learning based on a set of data, andwhich may include a neural network), and deep learning (referring ingeneral to machine learning using deep neural networks).

FIG. 2 is an illustration of a process for processing of video data byan apparatus or system in an analytics pipeline according to someembodiments. In a video analytics pipeline, such as analytics pipeline150 illustrated in FIG. 1B, an apparatus or system may provide forefficient video data analysis as shown in FIG. 2. For an edge device orsystem, such as an edge camera in an edge stage 160 illustrated in FIG.1B, the edge device or system may operate to capture video of a sceneand generate resulting video stream data 205. The process then maycontinue with performing video analytics, 220, which would generally belimited analysis in an edge device or system, such as the initialidentification of one or more subjects in an image. In some embodiments,the edge device or system is to insert the generated video analyticsdata as metadata in a message 225, wherein the message may be an SEImessage. In some embodiments, for an edge device that has generated thevideo streaming data, the processing may proceed with compressing thevideo stream 230 and transporting the compressed video streaming data,such as in the VCL, and the metadata to a next stage in the analyticspipeline 240.

Continuing the process illustrated in FIG. 2, in some embodiments, for adevice or system receiving video streaming data in an analyticspipeline, such as the gateway 170 or edge cloud 180 stages illustratedin FIG. 1B, may receive the compressed video streaming data and themetadata from an earlier stage 210, such as an edge device or systemtransporting the video streaming data and metadata 240. The process maycontinue with decoding the compressed video stream 215 and performingvideo analytics on the video data 220. In the case of a gateway 120 oredge cloud 130 device or system the analytics may include more complexanalytics than is performed by an edge device. In some embodiments, thedevice or system is to insert the video analytics into metadata 225,such as in an SEI message. In some embodiments, the apparatus or systemis to insert the metadata into an existing SEI message that was receivedfrom an earlier stage in analytics pipeline. In some embodiments, ifthere are no errors or other conditions that would require there-compressing of the video streaming data 230, the apparatus or systemmay then transport the received compressed video streaming data and themetadata to a following stage in the analytics pipeline 240. In thismanner, the apparatus or system is not required in normal circumstancesto re-compress the data stream for transmission because of theseparation of the analytics data from the video streaming data.

Additional details that may apply in the generation, analysis, andencoding of video data are illustrated in FIG. 4 below.

FIG. 3 is an illustration of video frames in a video stream according tosome embodiments. As illustrated in FIG. 3, is some embodiment aninitial frame 310 of a video stream 300 may include data 315 with SPS(Sequence Parameter Set) and PPS (Picture Parameter Set), and furtherincluding a coded slice carrying video data and an object tracking SEImessage carrying metadata, wherein the metadata includes video analyticsdata generated by an apparatus or system in an analytics pipeline, suchas the analytics pipeline 100 illustrated in FIG. 1B. Each followingframe 325 then includes coded slice carrying video data and an objecttracking SEI message carrying metadata. In some embodiments, anapparatus or system then is capable of separately accessing the videodata and the object tracking SEI containing video analytics data in theform of metadata, thus enabling the handling of video analytics withoutrequiring re-encoding of the video data at each interface betweenstages.

FIG. 4 is an illustration of video analytics operations according tosome embodiments. In some embodiments, a scene 405 is captured by anedge camera 410 to generate video data, wherein the camera mayoptionally provide limited analytics of the video data, such as to trackobjects in the video and identify a rectangular bounding box around theobjects in the video frame and provide for the compression of the videodata for transmission. In some embodiments, the edge camera 410generates an SEI message with metadata representing the results of thelimited analytics, such as metadata representing one or more rectangularbounding boxes is included in an object tracking SEI message as part ofthe video bitstream, which can be updated for individual pictures. Thisresults in a compressed stream of data to the video gateway plus the SEImetadata 420 to be received at the video gateway 425.

At later stages, such as the gateway or cloud servers, the video portionof the bit stream (e.g. the VCL layer) that had been originally createdby an initial encoder, such as at the edge, does not need to be alteredat these later stages. At these later stages, video analytics can beperformed, such as to detect and track objects and identify arectangular bounding box of the objects in each picture. The result ofthe video analytics is represented in the SEI message in the NetworkAbstraction Layer (NAL of the H.264 and H.265 standards) , which iseither appended to the VCL portion of the bit stream, or if an objecttracking SEI message were already included in the received bit streamfrom the edge camera, the contents of the SEI message is revised asneeded. The video gateway then may provide more complex video analyticsthat are to be represented by additional SEI metadata, resulting incompressed stream of data plus SEI metadata to the cloud server 430.

Thus, an embodiment of an apparatus, system, or process requires onlyminimal video processing at a gateway and servers, the processing beingto encode the SEI metadata. The video processing is considerably fasterwith less power consumption compared to a conventional flow. At edge andgateway servers, as the result of the video analytics being known to theCODEC layer, an encoder may use this information to efficiently processthe regions of interest in the frames. Any misdetections and/ormisclassifications may be rectified by a next element in the processingpipeline as this rectification involves the altering of the respectiveSEI message only. This may be contrasted to a conventional system, inwhich analytic data is inserted into the video frame and thus requiresmodification of the video frame to address issues such as misdetectionsand misclassifications.

In some embodiments, at a gateway or server in the event of packet loss,the object tracking SEI information may further be used by a decoder forefficient error concealment. Further, in an embodimentstandard-compliant bit streams may be produced with inter-operabilitybeing maintained, wherein, for example, a third-party decoder maydiscard the introduced object tracking SEI message.

In a specific example, the following sequence of operations may occur atthe gateway:

(1) The incoming bitstream is decompressed and the decoded video framesare sub-sampled (in accordance with the video and analyticsrequirements);

(2) The object tracking SEI NAL, if present in the incoming bitstream,is also decoded and the decoded contents are stored;

(3) The sub-sampled video frames are analyzed to detect potentialobjects of interest and tracking their locations within the video frameand the results are compared against the decoded object tracking SEI NALcontents (if present);

(4) In a case in which there are no differences in the comparison, theincoming compressed bit stream is sent to the cloud server for furtheranalysis;

(5) In case in which there are differences identified in the comparison,the updated contents are inserted into the object tracking SEI NAL andthe resultant object tracking SEI NAL is appended to the video portionof the bit stream and sent to the cloud server for further analysis;

(6) The object tracking SEI NAL is coded with high priority (using theNALU_PRIORITY_HIGH field in the NAL header), where in scenarios wherethe available bandwidth between the gateway and cloud server is reduced,the gateway drops the video portion and sends only the object trackingSEI NAL. The decoder at the cloud server uses the contents of the objecttracking SEI NAL for efficient error concealment:

and

(7) There are certain scenarios where re-encoding may be needed at thegateway, such as when multiple streams are decompressed and compositedto form a single frame in the case of a network video recorder (NVR). Inthis scenario, the encoder can use the object tracking SEI NAL contentsfor efficient video processing (such as more bits to the ROIs identifiedby the video analytics as an example).

One embodiment of the syntax of the novel object tracking SEI messageNAL is shown below in Table 1 and the corresponding semanticsinformation is also provided. In one embodiment, the proposed SEImessage carries parameters to describe the bounding box of trackedobjects within the compressed video bitstream; in another embodiment,object labels and confidence levels of detected and tracked objects maybe provided. The syntax uses persistence of parameters to avoid the needto re-signal information already available in previous SEI messageswithin the same persistence scope, such as within the same coded videosequence. For example, if one tracked object remains stationary for thecurrent picture relative to a previously coded picture while anothertracked object moves during that interval, the bounding box parametersare signaled only for the moving object.

A syntax flag is included to indicate if a coded video sequence is notintended for user viewing, but rather is intended for machine learningapplications. For example, this flag could be used when the trackedobjects are represented at high quality but areas outside of the trackedobjects are represented at very low video quality. Another syntax flagis included to indicate if the motion information (e.g., motion vectors,modes, etc.) were selected in order to accurately track objectionmotion, rather than to optimize coding efficiency. Further, a syntaxflag indicates if bounding boxes may represent the estimated position ofoccluded or partially occluded objects (versus representing only thevisible portion). A flag per tracked object may optionally indicate ifthe bounding box represents the size and location of an object that isonly partially visible within the coded picture. The number of bits ofgranularity of the confidence level is explicitly signaled, such as upto 16 bits.

TABLE 1 Annotated Region SEI Message Syntax Descriptor annotated_region( payloadSize ) { ar_cancel_flag u(1) ar_seq_parameter_set_id ue(v)ar_not_optimized_for_viewing_flag u(1) ar_true_motion_flag u(1)ar_occluded_objects_flag u(1) ar_partial_object_flag_present_flag u(1)ar_object_label_present_flag u(1)ar_object_detection_confidence_info_present_flag u(1) if(ar_object_detection_confidence_info_present_flag )ar_object_detection_confidence_precision_num_bits u(4) if(ar_object_label_present_flag) { ar_object_label_language_present_flagu(1) if ( ar_object_label_language_ present_flag ) { while(!byte_aligned( ) ) ar_zero_bit /* equal to 0 */ f(1)ar_object_label_language st(v) } ar_num_cancel_labels ue(v) for( i = 0;i < ar_num_cancel_labels; i++ ) { ar_cancel_label_idx[ i ] ue(v)ar_num_new_labels ue(v) for( i = 0; i < ar_num_new_labels; i++ ) {ar_label_idx[ i ] ue(v) while( !byte_aligned( ) ) ar_zero_bit /* equalto 0 */ f(1) ar_label[ ar_label_idx [ i ] ] st(v) } }ar_num_cancel_objects ue(v) for( i = 0; i < ar_num_cancel_objects; i++ ){ ar_cancel_object_idx[ i ] ue(v) ar_num_objects_minus1 ue(v) for( i =0; i <= ar_num_objects_minus1; i++ ) { ar_object_idx[ i ] ue(v)ar_new_object_flag[ ar_object_idx[ i ] ] u(1) if ( !ar_new_object_flag[ar_object_idx[ i ] ] ) ar_bounding_box_update_flag[ ar_object_idx[ i ] ]u(1) if( ar_new_object_flag[ ar_object_idx[ i ] && ar_object_label_(—)present_flag { ar_object_label_idc[ar_object_idx[ i ] ] ue(v) if(ar_partial_object_flag_present_flag) ar_partial_object_flag[ar_object_idx [ i ] ] u(1) if( ar_object_bounding_box_update_flag[ar_object_idx[ i ] ] || ar_new_object_flag[ ar_object_idx[ i ] ]) {ar_object_top[ ar_object_idx[ i ] ] u(16) ar_object_left[ ar_object_idx[i ] ] u(16) ar_object_width[ ar_object_idx[ i ] ] u(16)ar_object_height[ ar_object_idx[ i ] ] u(16) if (ar_object_detection_confidence_info_present_flag )ar_object_detection_confidence [ ar_object_idx[ i ] ] u(v) } } }

Annotated Region SEI Message Semantics:

The annotated region SEI message carries parameters to describeannotated regions using bounding boxes representing the size andlocation of tracked objects within the compressed video bitstream; andalso to describe optional elements such as object labels and objectdetection confidence levels.

ar_cancel_flag equal to 1 indicates that the SEI message cancels thepersistence of any previous annotated region SEI message in output orderthat is associated with one or more primary picture layers to which thisSEI applies. ar_cancel_flag equal to 0 indicates that annotated regioninformation follows.

ar_seq_parameter_set_id indicates and shall be equal to thesps_seq_parameter_set_ id value of the active SPS. The value ofar_seq_parameter_set_id shall be in the range of 0 to 15, inclusive.

ar_not_optimized_for_viewing_flag equal to 1 indicates that decodedpicture is not optimized for user viewing, but for other purposes.ar_not_optimized_for_viewing_flag equal to 0 indicates that the decodedpicture is optimized for user viewing.

ar_true_motion_flag equal to 1 indicates that the motion information inthe coded picture was selected with a goal of accurately representingobject motion for annotated objects. ar_true_motion_flag equal to 0makes no indication about motion vector accuracy of annotated objects.

ar_occluded_objects_flag equal to 1 indicates that the ar_object_top,ar_object_left, ar_object_width, and ar_object_height[ar_object_idx[i]]syntax elements represent the size and location of an object that maynot visible or may be only partially visible in the coded picture.ar_occluded_objects_flag equal to 0 indicates that the ar_object_top,ar_object_left, ar_object_width, and ar_object_height[ar_object_idx[i]]syntax elements represent the size and location of the visible portionof an object within the coded picture.

ar_partial_object_flag_present_flag equal to 1 indicates thatar_partial_object_flag[ar_object_idx[i]] syntax elements are present inthe coded bit stream. ar_partial_object_flag_present_flag equal to 0indicates that that ar_partial_object_flag[ar_object_idx[i]] syntaxelements are not present in the coded bit stream.

ar_object_label_present_flag equal to 1 indicates that the labelinformation corresponding to the annotated objects is present in thecoded bit stream. ar_object_label_present_flag equal to 0 indicates thatthe label information corresponding to the annotated objects is notpresent in the coded bit stream.

ar_bject_detection_confidence_info_present_flag equal to 1 indicatesthat ar_bject_detection_confidence[ ] is present in the bitstream.ar_bject_detection_confidence_info_present_flag equal to 0 indicatesthat ar_bject_detection_confidence[ ] is not present in the bitstream.

ar_bject_detection_confidence_precision_num_bits indicates the number ofbits used to represent ar_bject_detection_confidence[ ].

ar_bject_label_language_present_flag equal to 1 indicates that thear_object_label_language is present in the bit stream.ar_bject_label_language_present_flag equal to 0 indicates that thear_object_label_language is not present and that the language of thelabel is unspecified.

ar_zero_bit shall be equal to zero.

ar_object_label_language contains a language tag as specified by IETFRFC 5646 followed by a null termination byte equal to 0x00. The lengthof the ar_object_label_language syntax element shall be less than orequal to 255 bytes, not including the null termination byte.

ar_num_cancel_labels indicates the number of canceled labels associatedwith the annotated objects. ar_num_cancel_labels shall be in the rangeof 0 to 255, inclusive.

ar_cancel_label_idx[i] cancels the persistence of thear_cancel_label_idx[i]-th label. The value of ar_cancel_label_idx[i]shall be in the range of 0 to 255, inclusive.

ar_num_new_labels indicates the total number of new labels associatedwith the annotated objects that will be signaled. The value ofar_num_new_labels shall be in the range of 0 to 255, inclusive.

ar_label_idx[i] indicates the index to the label associated with thecorresponding annotated object. The value of ar_label_idx[i] shall be inthe range of 0 to 255, inclusive.

ar_label[ar_label_idx[i]] contains the label of the bounding box. Thelength of the ar_label[ar_label_idx[i]] syntax element shall be lessthan or equal to 255 bytes, not including the null termination byte.

ar_num_cancel_objects indicates the number of canceled annotatedobjects. ar_num_cancel_objects shall be in the range of 0 to 255.

ar_cancel_object_idx[i] cancels the persistence of thear_cancel_object_idx[i]-th annotated object. The value ofar_cancel_object_idx[i] shall be in the range of 0 to 255, inclusive.

ar_num_objects_minus1 plus 1 indicates the total number of annotatedobjects being tracked in the current decoded picture.ar_num_objects_minus1 shall be in the range of 0 to 255.

ar_object_idx[i] specifies the index of the object present in the listof objects present in the current coded picture. ar_object_idx[i] shallbe in the range of 0 to 255.

ar_new_object_flag[ar_object_idx[i]] equal to 1 indicates that thecorresponding object was not represented in earlier annotated region SEImessages within the persistance scope.ar_new_object_flag[ar_object_idx[i]] equal to 0 indicates that thecorresponding object was represented in earlier annotated region SEImessages within the persistance scope.

ar_bject_bounding_box_update_flag[ar_object_idx[i]] equal to 1 indicatesthat the bounding box of the corresponding object has been changed fromthe values represented in earlier annotated region SEI messages withinthe persistance scope.ar_bject_bounding_box_update_flag[ar_object_idx[i]] equal to 0 indicatesthat the bounding box of the corresponding object persists from earlierannotated region SEI messages within the persistance scope.

ar_object_label_idc[ar_object_idx[i]] specifies the index of the labelcorresponding to the object. The value ofar_object_label_idc[ar_object_idx[i]] persists from earlier annotatedregion SEI messages within the persistance scope. Ifar_object_label_idc[ar_object_idx[i]] was not present in earlierannotated region SEI messages within the persistance scope, its value isundefined.

ar_partial_object_flag[ar_object_idx[i]] equal to 1 indicates thatar_bject_top, ar_bject_left, ar_object_width andar_object_height[ar_object_idx[i]] syntax elements represent the sizeand location of an object that is only partially visible within thecoded picture. ar_partial_object_flag[ar_object_idx[i]] equal to 0indicates that ar_bject_top, ar_bject_left, ar_object_width andar_object_height[ ar_object_idx[i]] syntax elements represent the sizeand location of an object that is fully visible within the codedpicture.

ar_object_top[ar_object_idx[i]] and ar_object_left[ar_object_idx[i]]specify, as luma samples, the top and left location, respectively, ofthe ar_object_idx[i]-th object in the decoded picture. The values ofar_object_left[ar_object_idx[i]] shall be in the range of 0 topic_width_in_luma_samples inclusive and the value ofar_object_top[ar_object_idx[i]] will be in the range of 0 topic_height_in_luma_samples inclusive. The values ofar_object_top[ar_object_idx[i]] and ar_object_left[ar_object_idx[i]]persist from earlier annotated region SEI messages within thepersistance scope. If ar_object_top[ar_object_idx[i]] orar_object_left[ar_object_idx[i]] were not present in earlier annotatedregion SEI messages within the persistance scope, their values areundefined.

ar_object_width[ar_object_idx[i]] and ar_object_height[ar_object_idx[i]]specify as luma samples the width and height, respectively, of thear_object_idx[i]-th object in the decoded picture. When thear_partial_object_flag_present_flag is 0, the value ofar_object_left[ar_object_idx[i]]+ar_bject_width[ar_object_idx[i]] shallbe in the range of 0 to pic_width_in_luma_samples inclusive and thevalue ofar_object_top[ar_object_idx[i]]+ar_bject_height[ar_object_idx[i]] shallbe in the range of 0 to pic_height_in_luma_samples inclusive. The valuesof ar_object_width[ar_object_idx[i]] andar_object_height[ar_object_idx[i]] persist from earlier annotated regionSEI messages within the persistance scope. Ifar_object_width[ar_object_idx[i]] or ar_object_height[ar_object_idx[i]]was not present in earlier annotated region SEI messages within thepersistance scope, their values are undefined.

ar_bject_class_detection_confidence[ar_object_idx[i]] specifies theconfidence associated with the ar_object_idx[i]-th object, in units of2^(ar_bject_detection_confidence_precision_num_bits). The length of thear_bject_class_detection_confidence[ar_object_idx[i]] syntax element isar_bject_detection_confidence_precision_num_bits bits. The value ofar_bject_class_detection_confidence[ar_object_idx[i]] persists fromearlier annotated region SEI messages within the persistance scope. Ifar_bject_class_detection_confidence[ar_object_idx[i]] was not present inearlier annotated region SEI messages within the persistance scope, itsvalue is undefined.

FIG. 5 is an illustration of pictures for which video analytics metadatahas been generated according to some embodiments. FIG. 5 illustrates asequence of three pictures in a video stream, the pictures being Picture0, Picture 1, and Picture 2.

Picture 0—At Picture 0, there are two objects are present in the image,the two objects being a car and a person. Picture 0 key syntax mayinclude the following:

TABLE 2 Picture 0 Syntax ar_object_label_present_flag 1ar_num_new_labels 2 ar_label_idx[ 0 ] 0 ar_label_idx[ 1 ] 1 ar_label[ 0] Car ar_label[ 1 ] Person ar_num_objects_minus1 1 ar_object_idx[ 0 ] 0ar_new_object_flag[ 0 ] 1 ar_object_idx[ 1 ] 1 ar_new_object_flag[ 1 ] 1ar_object_label_idc[ 0 ] 0 ar_object_label_idc[ 1 ] 1 ar_object_top,left, width, height[ 0 ] BB_A ar_object_top, left, width, height[ 1 ]BB_B

Picture 1—At Picture 1, the car (Object 0) has remained in the sameposition, while and the person (Object 1) has moved to a new position.Picture 1 key syntax may include the following:

TABLE 2 Picture 1 Syntax ar_object_label_present_flag 1ar_num_new_labels 0 ar_num_objects_minus1 1 ar_object_idx[ 0 ] 0ar_new_object_flag[ 0 ] 0 ar_object_bounding_box_update_flag[ 0 ] 0ar_object_idx[ 1 ] 1 ar_new_object_flag[ 1 ] 0ar_object_bounding_box_update_flag[ 1 ] 1 ar_object_top, left, width,height[ 1 ] BB_C

In such syntax, the position of the car (Object 0) persists from Picture0, as BB_A.

Picture 2—At picture 2, the first car (Object 0) is no longer in thepicture, the person (object 1) has moved within the picture, a differentcar (Object 2) has entered the picture, and a dog (Object 3) has enteredthe picture. Picture 2 key syntax may include the following:

ar_object_label_present_flag 1 ar_num_new_labels 1 ar_label_idx[ 0 ] 2ar_label[ 2 ] dog ar_num_objects_minus1 2 ar_object_idx[ 0 ] 1ar_new_object_flag[ 1 ] 0 ar_object_bounding_box_update_flag[ 1 ] 1ar_object_idx[ 1 ] 2 ar_new_object_flag[ 2 ] 1 ar_object_label_idc[ 2 ]0 ar_object_idx[ 2 ] 3 ar_new_object_flag[ 3 ] 1 ar_object_label_idc[ 3] 2 ar_object_top, left, width, height[ 1 ] BB_D ar_object_top, left,width, height[ 2 ] BB_E ar_object_top, left, width, height[ 3 ] BB_F

FIG. 6 illustrates an apparatus or system to provide video analyticsencoding according to some embodiments. Apparatus or system 600represents a communication and data processing device including but notlimited to a smart camera, IP camera, edge camera, video streaming box,video scoring system, network video recorder, video DL (Deep Learning)training server, video scoring server, video storage streaming server,video application server, or other apparatus or system of a videoanalysis pipeline. In some embodiments, apparatus or system 600 mayinclude (without limitation) autonomous machines or artificiallyintelligent agents, such as a mechanical agents or machines, electronicsagents or machines, virtual agents or machines, electro-mechanicalagents or machines, etc.

In some embodiments, an apparatus or system 600 includes one or moreprocessors 605 (which may include one or more CPUs (Central ProcessingUnits)), having one or more processor cores, and may include one or moreGPUs 610 having one or more graphics processor cores, wherein the GPUs610 may be included within or separate from the one or more processors605. However, embodiments are not limited to this particular processingstructure. In some embodiments, the apparatus or system 600 furtherincludes a memory 615 to store data, including video data.

In some embodiments, the apparatus or system 600 includes videoprocessing elements 620, which may include, but are not limited to, avideo decoder 625 that may provide for decompression of compressed videodata. Video processing 620 may further include a video analyticsmechanism 630, which may include video analytics circuitry, to provideanalysis of video data. In some embodiments, the video analysis is to beinserted as metadata into an SEI message. Video processing 620 mayfurther include a video encoder 635 to provide for compression of videodata. Video processing 620 may further include a transport mechanism640, which may include transport circuitry, to perform transmission ofvideo data and associated metadata with video analysis. In someembodiments, the apparatus of system 600 may be an edge camera includingan imaging mechanism 645, wherein the imaging mechanism may includeelements such as one or more lens and an image sensor. Other detailsregarding the components of an imaging mechanism are beyond the scope ofthe present application and are not presented here.

In some embodiments, the apparatus or system 600, such as an apparatusor system receiving data from a preceding apparatus or system in a videoanalytics pipeline, may receive a compressed video stream 650 fordecoding and analysis by the video processing elements 620.

In some embodiments, the apparatus or system 600, such as an apparatusor system transmitting data to a preceding apparatus or system in avideo analytics pipeline, may transmit a compressed video stream 660 forhandling by the preceding apparatus or system, which may includedecoding and analysis.

In some embodiments, the apparatus or system 600 may further include anedge device, wherein the edge device may be a device to utilize receivedvideo data in one or more applications. In some embodiments, the edgedevice includes an interface to receive data from a preceding apparatusor system in a video analytics pipeline, the data including encoded datafor video streaming and a message including video analytics metadata forthe encoded data. In some embodiments, the edge device includes thevideo processing elements 620, wherein the video processing elements 620include the video decoder to decode the received video data to generatea decoded video stream, and further are to obtain video analytics fromthe received video analytics metadata. In some embodiments, the edgedevice is to apply the video analytics to the decoded video stream. Insome embodiments, a received message includes video analytics from aplurality of preceding apparatuses or systems in the video analyticspipeline.

FIG. 7 illustrates an apparatus or system including video analyticsencoding to provide improved efficiency of video processing andcompression according to some embodiments. For example, in oneembodiment, a video analytics coding mechanism 710 of FIG. 7 may beemployed or hosted by an apparatus or system 700, such as computingdevice 800 of FIG. 8. Apparatus or system 700 represents a communicationand data processing device including but not limited to a smart camera,IP camera, edge camera, video streaming box, video scoring system,network video recorder, video DL (Deep Learning) training server, videoscoring server, video storage streaming server, video applicationserver, or other apparatus or system of a video analysis pipeline. Insome embodiments, apparatus or system 700 may include (withoutlimitation) autonomous machines or artificially intelligent agents, suchas a mechanical agents or machines, electronics agents or machines,virtual agents or machines, electro-mechanical agents or machines, etc.

Further, for example, apparatus or system 700 may include a computerplatform hosting an integrated circuit (“IC”), such as a system on achip (“SoC” or “SOC”), integrating various hardware and/or softwarecomponents of apparatus or system 700 on a single chip.

As illustrated, in one embodiment, apparatus or system 700 may includeany number and type of hardware and/or software components, such as(without limitation) graphics processing unit 714 (“GPU” or simply“graphics processor”), graphics driver 716 (also referred to as “GPUdriver”, “graphics driver logic”, “driver logic”, user-mode driver(UMD), user-mode driver framework (UMDF), or simply “driver”), centralprocessing unit 712 (“CPU” or simply “application processor”), memory704, network devices, drivers, or the like, as well as input/output (IO)sources 708, such as touchscreens, touch panels, touch pads, virtual orregular keyboards, virtual or regular mice, ports, connectors, etc.Apparatus or system 700 may include an operating system (OS) serving asan interface between hardware and/or physical resources of apparatus orsystem 700 and a user.

It is to be appreciated that a lesser or more equipped system than theexample described above may be preferred for certain implementations.Therefore, the configuration of apparatus or system 700 may vary fromimplementation to implementation depending upon numerous factors, suchas price constraints, performance requirements, technologicalimprovements, or other circumstances.

Embodiments may be implemented as any or a combination of: one or moremicrochips or integrated circuits interconnected using a system board,hardwired logic, software stored by a memory device and executed by amicroprocessor, firmware, an application specific integrated circuit(ASIC), and/or a field programmable gate array (FPGA). The terms“logic”, “module”, “component”, “engine”, and “mechanism” may include,by way of example, software or hardware and/or a combination thereof,such as firmware.

In one embodiment, video analytics coding mechanism 710 may be hosted bymemory 704 of apparatus or system 700. In another embodiment, videoanalytics coding mechanism 710 may be hosted by or be part of operatingsystem 706 of apparatus or system 700. In another embodiment, videoanalytics coding mechanism 710 may be hosted or facilitated by graphicsdriver 716. In yet another embodiment, video analytics coding mechanism710 may be hosted by or part of graphics processing unit 714 (“GPU” orsimply “graphics processor”) or firmware of graphics processor 714. Forexample, video analytics coding mechanism 710 may be embedded in orimplemented as part of the processing hardware of graphics processor712. Similarly, in yet another embodiment, video analytics codingmechanism 710 may be hosted by or part of central processing unit 712(“CPU” or simply “application processor”). For example, video analyticscoding mechanism 710 may be embedded in or implemented as part of theprocessing hardware of application processor 712.

In yet another embodiment, apparatus or system 710 may be hosted by orpart of any number and type of components of apparatus or system 700,such as a portion of video analytics coding mechanism 710 may be hostedby or part of operating system 706, another portion may be hosted by orpart of graphics processor 714, another portion may be hosted by or partof application processor 712, while one or more portions of videoanalytics coding mechanism 710 may be hosted by or part of operatingsystem 706 and/or any number and type of devices of apparatus or system700. It is contemplated that embodiments are not limited to certainimplementation or hosting of video analytics coding mechanism 710 andthat one or more portions or components of video analytics codingmechanism 710 may be employed or implemented as hardware, software, orany combination thereof, such as firmware.

Apparatus or system 700 may host network interface(s) to provide accessto a network, such as a LAN, a wide area network (WAN), a metropolitanarea network (MAN), a personal area network (PAN), Bluetooth, a cloudnetwork, a mobile network (e.g., 3rd Generation (3G), 4th Generation(4G), 5th Generation (5G), etc.), an intranet, the Internet, etc.Network interface(s) may include, for example, a wireless networkinterface having antenna, which may represent one or more antenna(e).Network interface(s) may also include, for example, a wired networkinterface to communicate with remote devices via network cable, whichmay be, for example, an Ethernet cable, a coaxial cable, a fiber opticcable, a serial cable, or a parallel cable.

Embodiments may be provided, for example, as a computer program productwhich may include one or more machine-readable media (including anon-transitory machine-readable or computer-readable storage medium)having stored thereon machine-executable instructions that, whenexecuted by one or more machines such as a computer, network ofcomputers, or other electronic devices, may result in the one or moremachines carrying out operations in accordance with embodimentsdescribed herein. A machine-readable medium may include, but is notlimited to, floppy diskettes, optical disks, CD-ROMs (Compact Disc-ReadOnly Memories), and magneto-optical disks, ROMs, RAMs, EPROMs (ErasableProgrammable Read Only Memories), EEPROMs (Electrically ErasableProgrammable Read Only Memories), magnetic tape, magnetic or opticalcards, flash memory, or other type of media/machine-readable mediumsuitable for storing machine-executable instructions.

Moreover, embodiments may be downloaded as a computer program product,wherein the program may be transferred from a remote computer (e.g., aserver) to a requesting computer (e.g., a client) by way of one or moredata signals embodied in and/or modulated by a carrier wave or otherpropagation medium via a communication link (e.g., a modem and/ornetwork connection).

Throughout the document, term “user” may be interchangeably referred toas “viewer”, “observer”, “speaker”, “person”, “individual”, “end-user”,and/or the like. It is to be noted that throughout this document, termslike “graphics domain” may be referenced interchangeably with “graphicsprocessing unit”, “graphics processor”, or simply “GPU” and similarly,“CPU domain” or “host domain” may be referenced interchangeably with“computer processing unit”, “application processor”, or simply “CPU”.

It is to be noted that terms like “node”, “computing node”, “server”,“server device”, “cloud computer”, “cloud server”, “cloud servercomputer”, “machine”, “host machine”, “device”, “computing device”,“computer”, “computing system”, and the like, may be usedinterchangeably throughout this document. It is to be further noted thatterms like “application”, “software application”, “program”, “softwareprogram”, “package”, “software package”, and the like, may be usedinterchangeably throughout this document. Also, terms like “job”,“input”, “request”, “message”, and the like, may be used interchangeablythroughout this document.

FIG. 8 illustrates a computing device according to some embodiments. Itis contemplated that details of computing device 800 may be the same asor similar to details of apparatus or system 700 of FIG. 7 and thus forbrevity, certain of the details discussed with reference to apparatus orsystem 700 of FIG. 7 are not discussed or repeated hereafter. Computingdevice 800 houses a system board 802 (which may also be referred to as amotherboard, main circuit board, or other terms)). The system board 802may include a number of components, including but not limited to aprocessor 804 and at least one communication package or chip 806. Thecommunication package 806 is coupled to one or more antennas 816. Theprocessor 804 is physically and electrically coupled to the board 802.

Depending on its applications, computing device 800 may include othercomponents that may or may not be physically and electrically coupled tothe system board 802. These other components include, but are notlimited to, volatile memory (e.g., DRAM) 808, nonvolatile memory (e.g.,ROM) 809, flash memory (not shown), a graphics processor 812, a digitalsignal processor (not shown), a crypto processor (not shown), a chipset814, an antenna 816, a display 818 such as a touchscreen display, atouchscreen controller 820, a battery 822, an audio codec (not shown), avideo codec (not shown), a power amplifier 824, a global positioningsystem (GPS) device 826, a compass 828, an accelerometer (not shown), agyroscope (not shown), a speaker or other audio element 830, one or morecameras 832, a microphone array 834, and a mass storage device (such ashard disk drive) 810, compact disk (CD) (not shown), digital versatiledisk (DVD) (not shown), and so forth). These components may be connectedto the system board 802, mounted to the system board, or combined withany of the other components.

The communication package 806 enables wireless and/or wiredcommunications for the transfer of data to and from the computing device800. The term “wireless” and its derivatives may be used to describecircuits, devices, systems, methods, techniques, communicationschannels, etc., that may communicate data through the use of modulatedelectromagnetic radiation through a non-solid medium. The term does notimply that the associated devices do not contain any wires, although insome embodiments they might not. The communication package 806 mayimplement any of a number of wireless or wired standards or protocols,including but not limited to Wi-Fi (IEEE 802.11 family), WiMAX (IEEE802.16 family), IEEE 802.20, long term evolution (LTE), Ev-DO (EvolutionData Optimized), HSPA+, HSDPA+, HSUPA+, EDGE Enhanced Data rates for GSMevolution), GSM (Global System for Mobile communications), GPRS (GeneralPackage Radio Service), CDMA (Code Division Multiple Access), TDMA (TimeDivision Multiple Access), DECT (Digital Enhanced CordlessTelecommunications), Bluetooth, Ethernet derivatives thereof, as well asany other wireless and wired protocols that are designated as 3G, 4G,5G, and beyond. The computing device 800 may include a plurality ofcommunication packages 806. For instance, a first communication package806 may be dedicated to shorter range wireless communications such asWi-Fi and Bluetooth and a second communication package 806 may bededicated to longer range wireless communications such as GSM, EDGE,GPRS, CDMA, WiMAX, LTE, Ev-DO, and others.

The cameras 832 including any depth sensors or proximity sensor arecoupled to an optional image processor 836 to perform conversions,analysis, noise reduction, comparisons, depth or distance analysis,image understanding, and other processes as described herein. Theprocessor 804 is coupled to the image processor 836 to drive the processwith interrupts, set parameters, and control operations of imageprocessor and the cameras. Image processing may instead be performed inthe processor 804, the graphics processor 812, the cameras 832, or inany other device.

In various implementations, the computing device 800 may be a laptop, anetbook, a notebook, an Ultrabook, a smartphone, a tablet, a personaldigital assistant (PDA), an ultra-mobile PC, a mobile phone, a desktopcomputer, a server, a set-top box, an entertainment control unit, adigital camera, a portable music player, or a digital video recorder.The computing device may be fixed, portable, or wearable. In furtherimplementations, the computing device 800 may be any other electronicdevice that processes data or records data for processing elsewhere.

Embodiments may be implemented using one or more memory chips,controllers, CPUs (Central Processing Unit), microchips or integratedcircuits interconnected using a motherboard, an application specificintegrated circuit (ASIC), and/or a field programmable gate array(FPGA). The term “logic” may include, by way of example, software orhardware and/or combinations of software and hardware.

The following clauses and/or examples pertain to further embodiments orexamples. Specifics in the examples may be used anywhere in one or moreembodiments. The various features of the different embodiments orexamples may be variously combined with some features included andothers excluded to suit a variety of different applications. Examplesmay include subject matter such as a method, means for performing actsof the method, at least one machine-readable medium includinginstructions that, when performed by a machine cause the machine toperform acts of the method, or of an apparatus or system forfacilitating hybrid communication according to embodiments and examplesdescribed herein.

In some embodiments, an apparatus includes a memory to store data,including data for video streaming; and a video processing mechanism,wherein the video processing mechanism is to analyze video data andgenerate video analytics, generate metadata representing the videoanalytics and insert the generated video analytics metadata into amessage, and transmit the video data and the metadata to a succeedingapparatus or system in a video analytics pipeline, the video data beingcompressed video data.

In some embodiments, the apparatus further includes an imaging mechanismto capture images of a scene and generate the video data.

In some embodiments, the video processing mechanism is to compress thevideo data prior to transmission of the video data.

In some embodiments, the apparatus is an edge device to provide videoimaging.

In some embodiments, the apparatus is one of a smart camera or IP(Internet Protocol) camera.

In some embodiments, the video processing mechanism is further toreceive a compressed video stream from a preceding apparatus or systemin the video analytics pipeline and decode the compressed video streamto generate the video data for analysis.

In some embodiments, transmitting the video data includes transmittingthe received compressed video stream without re-compression of the videodata.

In some embodiments, the apparatus is one or a gateway or edge device.

In some embodiments, the video processing mechanism is further toreceive a message including video analytics metadata from the precedingapparatus or system in the video analytics pipeline, and whereininserting the generated video analytics data into a message includesinserting the generated video data into the received message.

In some embodiments, parameters within the message persist in afollowing message, the position of static existing object beingmaintained and only new objects or moving existing objects requiringupdating in the later message.

In some embodiments, the metadata includes one or more of a flag toindicate that a decoded image is not optimized for user viewing, a flagto indicate that motion information in an image was selected toaccurately represent object motion, or a flag to indicate presence of anobject detection confidence value.

In some embodiments, the message is a Supplemental EnhancementInformation (SEI) message that is carried in the Network AbstractionLayer (NAL).

In some embodiments, the analysis of the video data includes one or moreof computer vision, machine learning, or deep learning.

In some embodiments, a method includes analyzing video data and generatevideo analytics at an apparatus or system in a video analytics pipeline;generating metadata representing the video analytics and inserting thegenerated video analytics metadata into a message; and transmitting thevideo data and the metadata to a succeeding apparatus or system in thevideo analytics pipeline, the video data being compressed video data.

In some embodiments, the method further includes capturing images of ascene to generate the video data; and compressing the video data priorto transmission of the video data.

In some embodiments, the apparatus or system is an edge device includingan imaging mechanism.

In some embodiments, the method further includes receiving a compressedvideo stream from a preceding apparatus or system in the video analyticspipeline; and decoding the compressed video stream to generate the videodata for analysis.

In some embodiments, transmitting the video data includes transmittingthe received compressed video stream without re-compression of the videodata.

In some embodiments, the method further includes receiving a messageincluding video analytics metadata from the preceding apparatus orsystem in the video analytics pipeline, wherein inserting the generatedvideo analytics data into a message includes inserting the generatedvideo data into the received message.

In some embodiments, the message is a Supplemental EnhancementInformation (SEI) message that is carried in the Network AbstractionLayer (NAL).

In some embodiments, generating the video analytics includes generatingone or more labels for objects within the video data, wherein more thanone object may share a generated label.

In some embodiments, a system includes one or more processors to processvideo data; a memory to store data, including data for video streaming;and a video processing mechanism including a video analytics mechanismto analyze video data and generate video analytics, a video encoder toencode video data for transmission, and a transport mechanism to performtransmission of video data and other data, wherein the video processingmechanism is to generate metadata representing the video analytics andinsert the generated video analytics metadata into a message andtransmit the video data and the metadata to a succeeding apparatus orsystem in a video analytics pipeline, the video data being compressedvideo data.

In some embodiments, the system further includes an imaging mechanism tocapture images of a scene and generate the video data.

In some embodiments, the video processing mechanism is to compress thevideo data prior to transmission of the video data.

In some embodiments, wherein the video processing mechanism is furtherto receive a compressed video stream from a preceding apparatus orsystem in the video analytics pipeline; and the system further includesa video decoder to decode the compressed video stream and generate thevideo data for analysis.

In some embodiments, transmitting the video data includes transmittingthe received compressed video stream without re-compression of the videodata.

In some embodiments, the video processing mechanism is further toreceive a message including video analytics metadata from the precedingapparatus or system in the video analytics pipeline, and whereininserting the generated video analytics data into a message includesinserting the generated video data into the received message.

In some embodiments, an edge device includes an interface to receivedata from a preceding apparatus or system in a video analytics pipeline,the data including encoded data for video streaming and a messageincluding video analytics metadata for the encoded data; and a videoprocessing mechanism, wherein the video processing mechanism is todecode the received video data to generate a decoded video stream and isto obtain video analytics from the video analytics metadata.

In some embodiments, the edge device is to apply the video analytics tothe decoded video stream.

In some embodiments, the message includes video analytics from aplurality of preceding apparatuses or systems in the video analyticspipeline.

In the description above, for the purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the described embodiments. It will be apparent,however, to one skilled in the art that embodiments may be practicedwithout some of these specific details. In other instances, well-knownstructures and devices are shown in block diagram form. There may beintermediate structure between illustrated components. The componentsdescribed or illustrated herein may have additional inputs or outputsthat are not illustrated or described.

Various embodiments may include various processes. These processes maybe performed by hardware components or may be embodied in computerprogram or machine-executable instructions, which may be used to cause ageneral-purpose or special-purpose processor or logic circuitsprogrammed with the instructions to perform the processes.Alternatively, the processes may be performed by a combination ofhardware and software.

Portions of various embodiments may be provided as a computer programproduct, which may include a computer-readable medium having storedthereon computer program instructions, which may be used to program acomputer (or other electronic devices) for execution by one or moreprocessors to perform a process according to certain embodiments. Thecomputer-readable medium may include, but is not limited to, magneticdisks, optical disks, read-only memory (ROM), random access memory(RAM), erasable programmable read-only memory (EPROM),electrically-erasable programmable read-only memory (EEPROM), magneticor optical cards, flash memory, or other type of computer-readablemedium suitable for storing electronic instructions. Moreover,embodiments may also be downloaded as a computer program product,wherein the program may be transferred from a remote computer to arequesting computer. In some embodiments, a non-transitorycomputer-readable storage medium has stored thereon data representingsequences of instructions that, when executed by a processor, cause theprocessor to perform certain operations.

Many of the methods are described in their most basic form, butprocesses can be added to or deleted from any of the methods andinformation can be added or subtracted from any of the describedmessages without departing from the basic scope of the presentembodiments. It will be apparent to those skilled in the art that manyfurther modifications and adaptations can be made. The particularembodiments are not provided to limit the concept but to illustrate it.The scope of the embodiments is not to be determined by the specificexamples provided above but only by the claims below.

If it is said that an element “A” is coupled to or with element “B,”element A may be directly coupled to element B or be indirectly coupledthrough, for example, element C. When the specification or claims statethat a component, feature, structure, process, or characteristic A“causes” a component, feature, structure, process, or characteristic B,it means that “A” is at least a partial cause of “B” but that there mayalso be at least one other component, feature, structure, process, orcharacteristic that assists in causing “B.” If the specificationindicates that a component, feature, structure, process, orcharacteristic “may”, “might”, or “could” be included, that particularcomponent, feature, structure, process, or characteristic is notrequired to be included. If the specification or claim refers to “a” or“an” element, this does not mean there is only one of the describedelements.

An embodiment is an implementation or example. Reference in thespecification to “an embodiment,” “one embodiment,” “some embodiments,”or “other embodiments” means that a particular feature, structure, orcharacteristic described in connection with the embodiments is includedin at least some embodiments, but not necessarily all embodiments. Thevarious appearances of “an embodiment,” “one embodiment,” or “someembodiments” are not necessarily all referring to the same embodiments.It should be appreciated that in the foregoing description of exemplaryembodiments, various features are sometimes grouped together in a singleembodiment, figure, or description thereof for the purpose ofstreamlining the disclosure and aiding in the understanding of one ormore of the various novel aspects. This method of disclosure, however,is not to be interpreted as reflecting an intention that the claimedembodiments requires more features than are expressly recited in eachclaim. Rather, as the following claims reflect, novel aspects lie inless than all features of a single foregoing disclosed embodiment. Thus,the claims are hereby expressly incorporated into this description, witheach claim standing on its own as a separate embodiment.

1.-20. (canceled)
 21. An apparatus comprising: a video processingcircuitry to: generate video analytics of video data of a video stream,wherein the video analytics represents analysis of the video data; addthe video analytics to metadata associated with the video data; andtransmit, via a video analytics pipeline, a message having the videodata, the video analytics, and the metadata.
 22. The apparatus of claim21, further comprising an imaging circuitry to capture the video streamhaving the video data representing images of a scene, wherein theapparatus further comprises memory to store the video data.
 23. Theapparatus of claim 21, wherein the video processing circuitry is furtherto receive at least a portion of the video analytics from a firstcomputing device including a preceding computing device, and wherein themessage is transmitted to a second computing device including asubsequent computing device, wherein the video analytics identifies oneor more elements associated with the scene, wherein the one or moreelements include one or more of individuals, vehicles, objects, orconditions associated with the scene.
 24. The apparatus of claim 21,wherein the metadata comprises one or more flags identifying informationrelating to one or more of user viewing optimization, object motion, orobject detection confidence value.
 25. The apparatus of claim 21,wherein the message comprises a Supplemental Enhancement Information(SEI) message that is carried in a Network Abstraction Layer (NAL). 26.The apparatus of claim 21, wherein the apparatus comprises one or moreof an edge device, a gateway device, or a smart image capturing device.27. A method comprising: generating, by a computing device, videoanalytics of video data of a video stream, wherein the video analyticsrepresents analysis of the video data; adding the video analytics tometadata associated with the video data; and transmitting a messagehaving the video data, the video analytics, and the metadata.
 28. Themethod of claim 27, further comprising capturing the video stream havingthe video data representing images of a scene, and storing the videodata.
 29. The method of claim 27, further comprising receiving at leasta portion of the video analytics from a first computing device includinga preceding computing device, and wherein the message is transmitted toa second computing device including a subsequent computing device,wherein the video analytics identifies one or more elements associatedwith the scene, wherein the one or more elements include one or more ofindividuals, vehicles, objects, or conditions associated with the scene.30. The method of claim 27, wherein the metadata comprises one or moreflags identifying information relating to one or more of user viewingoptimization, object motion, or object detection confidence value. 31.The method of claim 27, wherein the message comprises a SupplementalEnhancement Information (SEI) message that is carried in a NetworkAbstraction Layer (NAL).
 32. The method of claim 27, wherein thecomputing device comprises one or more of an edge device, a gatewaydevice, or a smart image capturing device.
 33. A computer-readablemedium having stored thereon instructions which, when executed, cause acomputing device to perform operations comprising: generating videoanalytics of video data of a video stream, wherein the video analyticsrepresents analysis of the video data; adding the video analytics tometadata associated with the video data; and transmitting a messagehaving the video data, the video analytics, and the metadata.
 34. Thecomputer-readable medium of claim 33, wherein the operations furthercomprise capturing the video stream having the video data representingimages of a scene, and storing the video data.
 35. The computer-readablemedium of claim 33, wherein the operations further comprise receiving atleast a portion of the video analytics from a first computing deviceincluding a preceding computing device, and wherein the message istransmitted to a second computing device including a subsequentcomputing device, wherein the video analytics identifies one or moreelements associated with the scene, wherein the one or more elementsinclude one or more of individuals, vehicles, objects, or conditionsassociated with the scene.
 36. The computer-readable medium of claim 33,wherein the metadata comprises one or more flags identifying informationrelating to one or more of user viewing optimization, object motion, orobject detection confidence value.
 37. The computer-readable medium ofclaim 33, wherein the message comprises a Supplemental EnhancementInformation (SEI) message that is carried in a Network Abstraction Layer(NAL).
 38. The computer-readable medium of claim 33, wherein thecomputing device comprises one or more of an edge device, a gatewaydevice, or a smart image capturing device.